4  Further Exercises

Exercise 1

The file “NHSScotland.txt” contains data on the number of patients attending A&E every month in each of the 14 Scottish NHS boards, from 2007 up to 2023. This data set has the following variables:

  • “Date”: the end of the month that patient numbers are aggregated over.
  • “NHSBoard”: the Scottish NHS board the patients are from.
  • “TotalAttendances”: the total number of patients attending A&E in a given month and NHS board.
  • “Within4Hours”: the number of patients whose wait time was less than 4 hours.
  • “Over4Hours”: the number of patients whose wait time was greater than 4 hours.
  • “Over8Hours”: the number of patients whose wait time was greater than 8 hours.
  • “Over12Hours”: the number of patients whose wait time was greater than 12 hours.
    1. Read “NHSScotland.txt” into R and save it as a data frame called nhs.

    2. Change the column “NHSBoard” to be a factor. (Hint: you can see the names of all the Scottish NHS boards using the code unique(nhs$NHSBoard).)

    3. Add an additional column to nhs which calculates the percentage of total patients in A&E whose wait time is less than 4 hours. Call this new variable “PercentageWithin4Hours”.

    4. What is the average percentage of patients who had to wait less than 4 hours in each of the 14 Scottish NHS boards? (Hint: think how you can use the tapply() function.)

    1. Create a new data frame, called glasgow, which is a subset of nhs. This data set should only show observations from NHS Greater Glasgow & Clyde, as well as only having the variables “Date”, “TotalAttendances” and “Over4Hours”.

    2. Sort glasgow in order of decreasing number of patients who had to wait more than 4 hours in A&E. When did the greatest number of patients have to wait for longer than 4 hours?

  1. The file “HBPopulation.csv” contains data relating to the population size (in 2021) of each of the 14 Scottish NHS boards. Read this file into R and save it as a data frame called population.

    After reading Appendix B, merge nhs and population so that A&E attendance and the health board population size can be seen in the same data frame.